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On  Tests  for  Uniformity:  Neyman's  Statistic  and 
Statistics  Based  on  Gaps  and  Stretches 

By 

Herbert  Solomon  and  Michael  A.  Stephens 

Introduction. 

In  this  paper  we  provide  percentage  points  for  Neyman's  goodness-of- 
fit  statistics  of  order  two  for  the  uniform  distribution.  Recent  work  has 
suggested  that  this  statistic  is  powerful  against  a  wide  range  of  alterna¬ 
tives.  The  statistic  is  a  combination  of  the  sample  mean  and  sample 
variance  of  the  observations.  The  percentage  points  have  been  found  by 
fitting  Pearson  curves,  following  the  work  of  F.  N.  David  (1939)  who  gave 
the  first  four  moments  of  the  Neyman  statistic.  We  then  turn  our  attention 
to  another  situation  concerned  with  sampling  from  a  uniform  distribution. 

Deken  (1980)  has  produced  exact  distributions  and  moments  for  the  largest 
gaps  (spacings)  and  stretches  (higher  order  spacings)  among  points  uniformly 
distributed  on  a  unit  interval.  An  approximation  to  the  distribution  is 
also  suggested  by  Deken.  We  develop  the  Pearson  curve  fit  for  the  distri¬ 
bution  of  this  maximum  statistic  and  find  it  is  excellent  over  the  range  of 
values  and  somewhat  better  than  the  approximation  in  the  lower  tail  region. 

The  test  statistic  developed  by  Deken  is  powerful  against  alternatives  to 
the  uniform  distribution  that  are  likely  to  produce  several  c lusters  among 
the  n  points  along  the  line.  Deken  suggests  multiple  comparison  testing 
as  a  motivation.  Another  possibility  occurs  for  some  physical  phenomena, 
e.g.  Poisson  processes,  where  events  are  registered  on  a  line  and  the  issue  at 
hand  is  usually  whether  there  is  uniformity  which  would  represent  one  explana¬ 
tion,  or  clusters  of  the  n  point9  which  would  depict  another  model. 
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Neyman* s  Statistic  N! 


2* 


Neyman  (1937)  suggested  that  any  density  f(x)  on  the  interval  (0,1) 
can  be  written  in  the  form 


(1) 


f(x)  «  c  exp{ 


i  +  i  9i 

j-1  J  J 


(x)},  0  <  x  <  1,  k  *  1,2, , 


where  fl^(x),  i^(x) 9 . . .  are  Legendre  polynomials,  are 

parameters,  and  c,  a  function  of  *  ‘  *  *s  a  norxna^^z^nS  constant. 

When  0j  *  0,  for  all  j  >_  1,  f(x)  is  the  uniform  density  f(x)  «  1 
written  U(0,1).  The  Legendre  polynomials  are  orthogonal  on  the  interval 
(0,1),  and,  by  varying  k,  f (x)  may  be  made  to  approximate  any  given 
alternative.  As  the  0^  increase,  the  density  f (x)  varies  smoothly 
from  the  uniform  distribution;  thus  the  test  for  uniformity  can  be  put 
in  the  form  of  a  test  on  the  parameter  values,  i.e.  a  test  of 


*  o 

H°=  £  •* 


0  . 


By  likelihood  ratio  methods,  Neyman  found  an  appropriate  statistic  for 

testing  Hq.  Suppose  is  the  given  random  sample.  For  given 

2 

k  the  test  statistic  is  N^,  calculated  as  follows: 


(a)  Calculate 

(2) 

(b)  Then 
(3) 


1  n 

^  *7^  yxi>  •  i  -  ; 


l  v 
J-1 


2 

J  * 
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In  these  calculations,  Jtj  (x)  is  best  expressed  in  terms  of  y  *  x-0.5. 

For  the  first  four  polynomials 

fc^x)  -  2/3y;  £2<x)  =  /5(6y2-0.5); 

£3(x)  «  /7(20y3-3y);  ^(x)  =  3(70y4-15y2+0.375)  . 

2 

In  general,  HQ  will  be  rejected  for  large  values  of  N  .  Note  that 

—  2  2 
is  equivalent  to  x,  the  mean  of  the  x^.  In  fact  *  v^  and 

1/2  — 

v^  *  (12n)  (x-0.5).  Then  let  t^  be  the  upper  tail  percentage 

2 

point  for  at  significnce  level  a,  and  let  ZaU  be  the  upper 

and  lower  tail  percentage  point  at  level  a  for  x;  we  have  Z ^  ~  1-Z^, 

2  2 

and  t^a  85  12n(Z^^-0.5)  =  12n(0.5-Z  .  Thus  significance  points 

2  — 
for  can  be  found  from  significance  points  for  x;  a  table  of  such 

points  is  available  for  example,  in  Stephens  (1966,  Table  1).  Further, 

2  2  2 
*  E(x^-0.5)  /n  -  s  ,  a  form  of  sample  variance,  and  so  ^  is  a 

—  2  2 
combination  of  both  x  and  s  .  In  this  paper  we  concentrate  on  N^. 

Neyman  showed  that,  on  H^,  the  v^.  are  asymptotically  independent, 

and  each  is  normally  distributed  with  mean  0  and  variance  1.  Thus  the 

2  2 

asymptotic  null  distribution  of  is  for  the  alternative  family  (1) 

2 

the  asymptotic  distribution  is  noncentral  Xk  •  David  (1939)  examined 

2  2  2  2  2 

the  null  distribution  of  -  v^  and  a  for  finite  n,  by 

calculating  their  moments  and  fitting  Pearson  curves,  David  showed  that 

2  2 
for  n  20,  the  x^  approximations  were  very  good  for  N^,  when 

k  ■  1  or  2,  The  tests  are  consistent  and  asymptotically  unbiased. 
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Since  Neyman's  early  work  many  tests  for  uniformity  have  been 

developed,  and  the  statistic  has  been  somewhat  overlooked.  In  that 

era  before  computers  the  statistics  also  required  much  computation, 

as  was  pointed  out  by  David  (1939).  Recently,  however,  Locke  and 

Spurrier  (1978)  have  made  extensive  Monte  Carlo  studies  of  various  tests 

2 

for  uniformity  and  have  shown  that  is  effective  against  a  wide 

range  of  alternatives.  We  have  also  shown  this  to  be  so,  in  an 

unpubolshed  Monte  Carlo  study  with  some  alternatives  the  same  as, 

and  some  different  from  those  of  Locke  and  Spurrier.  The  fact  that 
2 

uses  both  sample  mean  and  sample  variance  makes  it  plausible  that 

it  will  detect  many  types  of  non-uniformity ,  and  these  simple  statistics 

also  have  a  natural  appeal.  It  seems  worthwhile  therefore  to  give  a 

2 

set  of  upper  tail  percentage  points  for  N £ >  on  H^,  for  small  values 
of  n.  This  is  done  in  Table  1.  The  points  are  derived  by  fitting 
Pearson  curves  to  the  first  four  moments  given  by  David  (1939).  These 
moments  are  as  follows: 


V  m  2 


M 


2 


y 


3 


y 


4 


4 


32 

35n 


16  + 


704 

49n 


722208 

35035n2 


144  + 


15216 

49n 


2203468 

35035n2 


17946980 

119119n3 
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David  used  two  moments,  or  three  moments  with  a  zero  start  to  approximate 

2  2 
the  distribution  of  by  Pearson  curves  of  Type  I  for  and  Type  VI 

2 

for  for  n  *»  5,  10,  20,  30,  50,  100,  and  concluded  that  these  approxima¬ 
tions  gave  very  good  results.  A  table  of  percentage  points  was  not  given,  how¬ 
ever,  (it  would  have  been  very  tedious  to  calculate  at  the  time)  and  the 
present  Table  1  might  therefore  be  regarded  as  an  extension  of  David’s 
work,  making  use  of  modern  capabilities.  The  Pearson  curve  approxima¬ 
tions  are  based  on  the  extensive  tables  of  significance  points  produced 

by  Johnson,  Nixon,  Amos  an  Pearson  (1963)  and  reproduced  in  Pearson  and 

2 

Hartley  (1972).  David  showed  that  the  asymptotic  X2  approximation 

will  be  accurate  for  quite  small  n,  and  Table  1  demonstrates  this  also. 

2 

The  last  row  of  percentage  points  is  obtained  from  X2*  Further 
comments  on  the  Neyman  tests  are  in  Pearson  (1938)  and  David  (1939). 

Barton  (1953)  considered  a  slightly  different  class  of  alternatives 
given  by 

k 

f (x)  -  J  0  l  (x)  ,  0£x£l,  k  =  0,1, . . . 

j=o  i  J 

with  0q  equal  to  1.  A  restriction  must  now  be  placed  on  the  0^ 
to  ensure  that  the  density  is  always  positive.  The  same  statistics 
may  again  be  used  to  test  for  uniformity  against  this  alternative. 


TABLE  1 


Upper  tail  percentage  points  for  N 


2 

2  * 


N\_ 

0.5 

0.75 

0.8 

0.9 

0.95 

0.975 

0.99 

0.995 

2 

1.571 

2.702 

3.058 

4.193 

5.411 

6.748 

8.741 

10.454 

3 

1.492 

2.685 

3.069 

4.292 

5.594 

6.998 

9.041 

10.749 

4 

1.459 

2.689 

3.086 

4.352 

5.688 

7.109 

9.143 

10.810 

5 

1.442 

2.696 

3.102 

4.393 

5.745 

7.172 

9.190 

10.827 

6 

1.432 

2.704 

3.116 

4.422 

5.784 

7.212 

9.215 

10.826 

7 

1.425 

2.710 

3.126 

4.445 

5.812 

7.239 

9.227 

10.815 

8 

1.420 

2.716 

3.135 

4.462 

5.833 

7.257 

9.231 

10.798 

9 

1.416 

2.721 

3.143 

4.476 

5.849 

7.272 

9.235 

10.787 

10 

1.413 

2.725 

3.149 

4.487 

5.862 

7.283 

9.235 

10.773 

12 

1.409 

2.731 

3.159 

4.505 

5.883 

7.300 

9.237 

10.755 

14 

1.406 

2.736 

3.166 

4.517 

5.897 

7.311 

9.235 

10.735 

16 

1.403 

2.740 

3.172 

4.527 

5.908 

7.319 

9.  233 

10.720 

18 

1.402 

2.744 

3.177 

4.536 

5.918 

7.327 

9.235 

10.716 

20 

1.400 

2.746 

3.181 

4.542 

5.925 

7.332 

9.234 

10.706 

25 

1.398 

2.751 

3.188 

4.554 

5.937 

7.341 

9.230 

10.684 

30 

1.396 

2.755 

3.193 

4.562 

5.947 

7.348 

9.230 

10.677 

35 

1.395 

2.757 

3.196 

4.568 

5.952 

7.352 

9.226 

10.662 

40 

1.394 

2.759 

3.199 

4.573 

5.958 

7.357 

9.230 

10.666 

45 

1.393 

2.760 

3.201 

4.576 

5.961 

7.357 

9.221 

10.645 

50 

1.392 

2.762 

3.203 

4.579 

5.964 

7.360 

9.223 

10.646 

60 

1.391 

2.763 

3.206 

4.584 

5.969 

7.364 

9.224 

10.644 

80 

1.390 

2.766 

3.209 

4.589 

5.974 

7.367 

9.218 

10.627 

100 

1.390 

2.768 

3.212 

4.592 

5.979 

7.370 

9.220 

10.626 

CO 

1.386 

2.773 

3.219 

4.605 

5.991 

7.378 

9.210 

10.597 
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Gaps  and  Stretches . 

In  an  interesting  paper, Deken  (1980)  has  looked  into  the  distribution 
of  gaps  and  stretches  that  arise  in  sampling  from  a  uniform  distribution 
over  the  unit  interval.  For  densities  other  than  the  uniform  an 
appropriate  probability  inverse  transformation  can  be  employed  to 
achieve  a  uniform  distribution.  Consider  the  order  statistics  , 

in  a  sample  of  size  n  from  the  uniform  distribution.  Define  the 
p-stretches  zi*z2*  *  *  *  ,zn+l-p  J  ^  =  l>-**sn+l-p.  The 

variables  z^  are  called  spacings  (p  *  2),  or  higher  order  spacings 
(p  >  2).  There  is  an  extensive  literature  on  spacings  and  in  some 
cases  this  literature  deals  with  the  classical  geometrical  probability 
problems  of  random  coverage  of  the  circumference  of  a  circle  by  random 
arcs.  There  is  a  duality  between  distributions  related  to  spacings 
and  distributions  of  coverage.  A  recent  article  by  Holst  (1980),  gives 
some  new  results  in  this  interesting  subject  as  well  as  many  references. 

In  some  recent  work  on  multiple  comparisons,  Welsh  (1977),  looks  into 
the  variables  z^  and  labels  them  gaps  for  p  =  2  and  stretches  for 
p  >  2.  Deken 's  key  contribution  is  to  deal  directly  with  the  lack  of  inde¬ 
pendence  between  successive  p-stretches.  This  is  rather  formidable  for 
p  >  2  and  had  led  in  the  past  to  asymptotic  considerations.  The  multiple 
comparison  situation  is  one  in  which  asymptotic  results  may  not  be  suffi¬ 
ciently  accurate  and  in  which  the  number  of  points  n  may  often  be  small. 
Deken  demonstrates  through  a  recursive  formulation  how  to  derive  exact 
results  in  many  cases.  Specifically,  he  computed  the  exact 
distribution  for  the  maximum  p-stretch  for  all  values  of  p  for  ten  or 
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fewer  points  uniformly  distributed  in  the  unit  interval.  In  addition  to 
the  exact  distributions,  he  provides  formulas  for  moments  for  n ■  2,3 , . . . ,10, 
p  ■  2, 3,. ..,10  and  quantiles  of  the  distribution.  He  also  produces  an 
approximation  based  on  an  independence  assumption  for  the  successive 
p-stretches  utilizing  the  fact  that  the  distribution  of  any  individual 
p-st retch  is  Beta  when  sampling  is  from  the  uniform  distribution.  Therefore, 
an  approximation  to  the  exact  distributions  computed  by  Deken  is  that  of  the 
maximum  of  (n+l-p)  independent  Beta  variable. 

In  Table  2  quantiles  are  listed  for  several  cases  where  each  cell 
lists  the  exact  value,  the  Pearson  curve  fit,  and  the  approximation  given 
by  the  Beta  assumption.  For  the  column  where  the  number  of  points  is  five 
and  the  stretch  is  five  and  similarly,  where  the  number  of  points  is  ten  and 
the  stretch  is  ten,  no  approximate  values  are  given  because  we  are  dealing 
directly  with  the  distribution  of  the  range.  The  Pearson  curve  fits  do 
extremely  well  over  all  cells,  whereas  the  Beta  approximation  is  only  viable 
for  the  upper  tail. 

From  Deken’ s  development,  it  appears  that  for  n  >  10,  the  analytical 
development  of  moments  is  more  feasible  than  producing  the  quantiles  of 
the  exact  distribution  of  the  maximum  p-stretches.  Thus,  should  moments 
become  available  for  n  >  10,  the  PC  fit  can  be  achieved  rather  econom.  -illy 
and  with  the  knowledge  that  the  fit  will  be  excellent. 
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TABLE  2 


Percentage  points  for  the  maximum  p-stretch  in  samples 
of  size  n  from  the  uniform  distribution. 


a 

n-5 

n-5 

n-5 

n-5 

n*10 

n-10 

P-2 

P-3 

p-4 

P“5 

P-9 

p— 10 

Exact 

.1570 

.2308 

.2897 

.3426 

.5635 

.6058 

P.C. 

.05 

.1850 

.2321 

.2903 

.3426 

.5637 

.6059 

Approx* 

.1202 

.2506 

.3425 

.6316 

Exact 

.1917 

.2811 

.3525 

.4161 

.6176 

.6632 

P.C. 

.10 

.1904 

.2811 

.3527 

.4161 

.6176 

.6632 

Approx . 

.1523 

.2963 

.3993 

.6738 

Exact 

.2555 

.3716 

.4649 

.5458 

.7033 

.7526 

P.C* 

.25 

.2530 

.3700 

.4645 

.5458 

.7032 

.7527 

Approx. 

.2178 

.3813 

.5000 

.7414 

Exact 

.3340 

.4754 

.5914 

.6862 

.7878 

.8377 

P.C. 

.50 

.3347 

.4762 

.5915 

.6862 

.7878 

.8377 

Approx. 

.3076 

.4854 

.6144 

.8098 

Exact 

.4254 

.5839 

.7100 

.8062 

.8577 

.9036 

P.C. 

.75 

.4280 

.5852 

.7102 

.8063 

.8578 

.9035 

Approx. 

.4135 

.5944 

.7230 

.8686 

Exact 

.5218 

.6832 

.8024 

.8878 

.9069 

.9455 

P.C. 

.90 

.5207 

.6814 

.8022 

.8878 

.9069 

.9455 

Approx • 

.5181 

.6905 

.8089 

.9118 

Exact 

.5837 

.7393 

.8488 

.9236 

.9301 

.9632 

P.C. 

.95 

.5795 

.7364 

.8484 

.9235 

.9300 

.9633 

Approx. 

.5821 

.7445 

.8527 

.9329 

Exact 

.6983 

.8311 

.9159 

.9673 

.9621 

.9845 

P.C. 

.99 

.6947 

.8316 

.9159 

.9672 

.9616 

.9847 

Approx. 

.6982 

.8333 

.9171 

.9630 
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Distributions  of  two  statistics  arising  in  sampling  from  a 
uniform  distribution  are  investigated.  They  are  Neyman's  smooth 
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or  stretch  between  p  points  in  a  sample  of  size  n,  p^  n.  V 
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the  stretch  statistic.  The  Pearson  curve  values  are  excellent 
approximations  along  the  range  of  values  of  each  statistic. 
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